Lets do a short introduction to what we’re working with here.
#original Siebert Data
aei <- read.csv2("/Volumes/RachelExternal/Thesis/Data/SiebertData/HID_v10/Country_tenyear_ha.csv")
aei <- as.data.frame(aei)
#rename the cols so that I can use them as number
colnames(aei) <- c("country", "ISO", "ID", "1900", "1910","1920","1930","1940","1950","1960","1970","1980","1985","1990","1995","2000","2005")
#tidying data to long format
aei <- pivot_longer(aei,cols = 4:17, names_to = "year", values_to = "aei_ha")
aei$aei_ha <- removePunctuation(aei$aei_ha)
aei$year <- as.numeric(aei$year)
aei$aei_ha <- as.numeric(aei$aei_ha)
aei$ISO <- as.factor(aei$ISO)
aei$country <- as.factor(aei$country)
#subset the data so that its from the year 1960
aei <- aei %>% subset(year >= 1960)
#make a new col for year count
aei$yearcount <- (aei$year - 1960)
#remove data for the total global irrigated area
globeaei <- subset(aei, country == "WORLD")
aei= aei[!aei$country == "WORLD",]
str(aei)
## tibble [1,848 × 6] (S3: tbl_df/tbl/data.frame)
## $ country : Factor w/ 232 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 2 2 ...
## $ ISO : Factor w/ 232 levels "","ABW","AFG",..: 3 3 3 3 3 3 3 3 6 6 ...
## $ ID : int [1:1848] 2 2 2 2 2 2 2 2 6 6 ...
## $ year : num [1:1848] 1960 1970 1980 1985 1990 ...
## $ aei_ha : num [1:1848] 2404990 2408168 2518178 2594394 2903569 ...
## $ yearcount: num [1:1848] 0 10 20 25 30 35 40 45 0 10 ...
Added to this to be able to calculate the % of land irrigated per country we need the country area. This data is taken from Worldbank.
#import the data set with total country area, this comes from the Worldbank
countryarea <- read.csv2("/Volumes/RachelExternal/Thesis/Data/LandArea/LandArea.csv")
#removing extra cols
countryarea <- countryarea[-c(265:270), ]
countryarea <- countryarea[, -c(1, 3:4)]
#fix the col names (in the two days since i found this little for loop I learned how to do this faster!)
for ( col in 2:ncol(countryarea)){
colnames(countryarea)[col] <- sub("X", "", colnames(countryarea)[col])
}
remove(col)
#making long data
countryarea <-
countryarea %>%
pivot_longer(!Country.Code, names_to = "year", values_to = "area_km") %>%
rename(ISO = Country.Code)
#fixing variable types
countryarea$ISO <- as.factor(countryarea$ISO)
countryarea$year <- as.numeric(countryarea$year)
str(countryarea)
## tibble [16,104 × 3] (S3: tbl_df/tbl/data.frame)
## $ ISO : Factor w/ 264 levels "ABW","AFG","AGO",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : num [1:16104] 1960 1961 1962 1963 1964 ...
## $ area_km: num [1:16104] NA 180 180 180 180 180 180 180 180 180 ...
For some reason, the data from Worldbank on each countries area only starts in 1961. For this analysis, 1960 is needed. Country area from 1961 will be substituted for 1960, with the assumption that country area doesn’t change drastically in one year (it doesn’t).
head(subset(countryarea, c(year == 1960 | year == 1961)))
## # A tibble: 6 x 3
## ISO year area_km
## <fct> <dbl> <dbl>
## 1 ABW 1960 NA
## 2 ABW 1961 180
## 3 AFG 1960 NA
## 4 AFG 1961 652860
## 5 AGO 1960 NA
## 6 AGO 1961 1246700
#fixing 1960
countryarea <-
countryarea %>%
group_by(ISO) %>%
fill(area_km, .direction = "up") %>%
ungroup()
Merge the country area with the AEI df here.
#combine these data sets
aei <- aei %>% merge(y = countryarea, by.x = c("ISO", "year"), all.x = TRUE)
rm(countryarea)
head(aei)
## ISO year country ID aei_ha yearcount area_km
## 1 ABW 1960 Aruba 1 0 0 180
## 2 ABW 1970 Aruba 1 0 10 180
## 3 ABW 1980 Aruba 1 0 20 180
## 4 ABW 1985 Aruba 1 0 25 180
## 5 ABW 1990 Aruba 1 0 30 180
## 6 ABW 1995 Aruba 1 0 35 180
Some countries also have no area value. I’ll go through and see if any of the countries with missing area values have non zero AEI_ha. If so, I’ll replace them, as they didn’t come through with the WorldBank data set but are online. Copying and pasting from various sources should be fine.
#checking for countries with NA values in their area_km col
#looking primarily for countries that have non 0 AEI_ha
aei[is.na(aei$area_km),]
## ISO year country ID aei_ha yearcount area_km
## 25 AIA 1960 Anguilla 4 0 0 NA
## 26 AIA 1970 Anguilla 4 0 10 NA
## 27 AIA 1980 Anguilla 4 0 20 NA
## 28 AIA 1985 Anguilla 4 0 25 NA
## 29 AIA 1990 Anguilla 4 0 30 NA
## 30 AIA 1995 Anguilla 4 0 35 NA
## 31 AIA 2000 Anguilla 4 0 40 NA
## 32 AIA 2005 Anguilla 4 0 45 NA
## 49 ANT 1960 Netherlands Antilles 8 0 0 NA
## 50 ANT 1970 Netherlands Antilles 8 0 10 NA
## 51 ANT 1980 Netherlands Antilles 8 0 20 NA
## 52 ANT 1985 Netherlands Antilles 8 0 25 NA
## 53 ANT 1990 Netherlands Antilles 8 0 30 NA
## 54 ANT 1995 Netherlands Antilles 8 0 35 NA
## 55 ANT 2000 Netherlands Antilles 8 0 40 NA
## 56 ANT 2005 Netherlands Antilles 8 0 45 NA
## 89 ATA 1960 Antarctica 13 0 0 NA
## 90 ATA 1970 Antarctica 13 0 10 NA
## 91 ATA 1980 Antarctica 13 0 20 NA
## 92 ATA 1985 Antarctica 13 0 25 NA
## 93 ATA 1990 Antarctica 13 0 30 NA
## 94 ATA 1995 Antarctica 13 0 35 NA
## 95 ATA 2000 Antarctica 13 0 40 NA
## 96 ATA 2005 Antarctica 13 0 45 NA
## 265 BVT 1960 Bouvet Island 36 0 0 NA
## 266 BVT 1970 Bouvet Island 36 0 10 NA
## 267 BVT 1980 Bouvet Island 36 0 20 NA
## 268 BVT 1985 Bouvet Island 36 0 25 NA
## 269 BVT 1990 Bouvet Island 36 0 30 NA
## 270 BVT 1995 Bouvet Island 36 0 35 NA
## 271 BVT 2000 Bouvet Island 36 0 40 NA
## 272 BVT 2005 Bouvet Island 36 0 45 NA
## 297 CCK 1960 Cocos Islands 42 0 0 NA
## 298 CCK 1970 Cocos Islands 42 0 10 NA
## 299 CCK 1980 Cocos Islands 42 0 20 NA
## 300 CCK 1985 Cocos Islands 42 0 25 NA
## 301 CCK 1990 Cocos Islands 42 0 30 NA
## 302 CCK 1995 Cocos Islands 42 0 35 NA
## 303 CCK 2000 Cocos Islands 42 0 40 NA
## 304 CCK 2005 Cocos Islands 42 0 45 NA
## 361 COK 1960 Cook Islands 50 0 0 NA
## 362 COK 1970 Cook Islands 50 0 10 NA
## 363 COK 1980 Cook Islands 50 0 20 NA
## 364 COK 1985 Cook Islands 50 0 25 NA
## 365 COK 1990 Cook Islands 50 0 30 NA
## 366 COK 1995 Cook Islands 50 0 35 NA
## 367 COK 2000 Cook Islands 50 0 40 NA
## 368 COK 2005 Cook Islands 50 0 45 NA
## 409 CXR 1960 Christmas Island 56 0 0 NA
## 410 CXR 1970 Christmas Island 56 0 10 NA
## 411 CXR 1980 Christmas Island 56 0 20 NA
## 412 CXR 1985 Christmas Island 56 0 25 NA
## 413 CXR 1990 Christmas Island 56 0 30 NA
## 414 CXR 1995 Christmas Island 56 0 35 NA
## 415 CXR 2000 Christmas Island 56 0 40 NA
## 416 CXR 2005 Christmas Island 56 0 45 NA
## 513 ESH 1960 Western Sahara 69 0 0 NA
## 514 ESH 1970 Western Sahara 69 0 10 NA
## 515 ESH 1980 Western Sahara 69 0 20 NA
## 516 ESH 1985 Western Sahara 69 0 25 NA
## 517 ESH 1990 Western Sahara 69 0 30 NA
## 518 ESH 1995 Western Sahara 69 0 35 NA
## 519 ESH 2000 Western Sahara 69 0 40 NA
## 520 ESH 2005 Western Sahara 69 0 45 NA
## 561 FLK 1960 Falkland Islands 75 0 0 NA
## 562 FLK 1970 Falkland Islands 75 0 10 NA
## 563 FLK 1980 Falkland Islands 75 0 20 NA
## 564 FLK 1985 Falkland Islands 75 0 25 NA
## 565 FLK 1990 Falkland Islands 75 0 30 NA
## 566 FLK 1995 Falkland Islands 75 0 35 NA
## 567 FLK 2000 Falkland Islands 75 0 40 NA
## 568 FLK 2005 Falkland Islands 75 0 45 NA
## 641 GLP 1960 Guadeloupe 86 889 0 NA
## 642 GLP 1970 Guadeloupe 86 2000 10 NA
## 643 GLP 1980 Guadeloupe 86 2000 20 NA
## 644 GLP 1985 Guadeloupe 86 2000 25 NA
## 645 GLP 1990 Guadeloupe 86 2000 30 NA
## 646 GLP 1995 Guadeloupe 86 2000 35 NA
## 647 GLP 2000 Guadeloupe 86 6000 40 NA
## 648 GLP 2005 Guadeloupe 86 6454 45 NA
## 705 GUF 1960 French Guiana 94 1000 0 NA
## 706 GUF 1970 French Guiana 94 1000 10 NA
## 707 GUF 1980 French Guiana 94 1000 20 NA
## 708 GUF 1985 French Guiana 94 2000 25 NA
## 709 GUF 1990 French Guiana 94 2000 30 NA
## 710 GUF 1995 French Guiana 94 3500 35 NA
## 711 GUF 2000 French Guiana 94 4800 40 NA
## 712 GUF 2005 French Guiana 94 5608 45 NA
## 729 HMD 1960 Heard and McDonald Islands 98 0 0 NA
## 730 HMD 1970 Heard and McDonald Islands 98 0 10 NA
## 731 HMD 1980 Heard and McDonald Islands 98 0 20 NA
## 732 HMD 1985 Heard and McDonald Islands 98 0 25 NA
## 733 HMD 1990 Heard and McDonald Islands 98 0 30 NA
## 734 HMD 1995 Heard and McDonald Islands 98 0 35 NA
## 735 HMD 2000 Heard and McDonald Islands 98 0 40 NA
## 736 HMD 2005 Heard and McDonald Islands 98 0 45 NA
## 785 IOT 1960 British Indian Ocean Territory 106 0 0 NA
## 786 IOT 1970 British Indian Ocean Territory 106 0 10 NA
## 787 IOT 1980 British Indian Ocean Territory 106 0 20 NA
## 788 IOT 1985 British Indian Ocean Territory 106 0 25 NA
## 789 IOT 1990 British Indian Ocean Territory 106 0 30 NA
## 790 IOT 1995 British Indian Ocean Territory 106 0 35 NA
## 791 IOT 2000 British Indian Ocean Territory 106 0 40 NA
## 792 IOT 2005 British Indian Ocean Territory 106 0 45 NA
## 1129 MSR 1960 Montserrat 154 0 0 NA
## 1130 MSR 1970 Montserrat 154 0 10 NA
## 1131 MSR 1980 Montserrat 154 0 20 NA
## 1132 MSR 1985 Montserrat 154 0 25 NA
## 1133 MSR 1990 Montserrat 154 0 30 NA
## 1134 MSR 1995 Montserrat 154 0 35 NA
## 1135 MSR 2000 Montserrat 154 0 40 NA
## 1136 MSR 2005 Montserrat 154 0 45 NA
## 1137 MTQ 1960 Martinique 155 1000 0 NA
## 1138 MTQ 1970 Martinique 155 1000 10 NA
## 1139 MTQ 1980 Martinique 155 5000 20 NA
## 1140 MTQ 1985 Martinique 155 4000 25 NA
## 1141 MTQ 1990 Martinique 155 4000 30 NA
## 1142 MTQ 1995 Martinique 155 5000 35 NA
## 1143 MTQ 2000 Martinique 155 7000 40 NA
## 1144 MTQ 2005 Martinique 155 6407 45 NA
## 1193 NFK 1960 Norfolk Island 163 0 0 NA
## 1194 NFK 1970 Norfolk Island 163 0 10 NA
## 1195 NFK 1980 Norfolk Island 163 0 20 NA
## 1196 NFK 1985 Norfolk Island 163 0 25 NA
## 1197 NFK 1990 Norfolk Island 163 0 30 NA
## 1198 NFK 1995 Norfolk Island 163 0 35 NA
## 1199 NFK 2000 Norfolk Island 163 0 40 NA
## 1200 NFK 2005 Norfolk Island 163 0 45 NA
## 1217 NIU 1960 Niue 166 0 0 NA
## 1218 NIU 1970 Niue 166 0 10 NA
## 1219 NIU 1980 Niue 166 0 20 NA
## 1220 NIU 1985 Niue 166 0 25 NA
## 1221 NIU 1990 Niue 166 0 30 NA
## 1222 NIU 1995 Niue 166 0 35 NA
## 1223 NIU 2000 Niue 166 0 40 NA
## 1224 NIU 2005 Niue 166 0 45 NA
## 1289 PCN 1960 Pitcairn Islands 175 0 0 NA
## 1290 PCN 1970 Pitcairn Islands 175 0 10 NA
## 1291 PCN 1980 Pitcairn Islands 175 0 20 NA
## 1292 PCN 1985 Pitcairn Islands 175 0 25 NA
## 1293 PCN 1990 Pitcairn Islands 175 0 30 NA
## 1294 PCN 1995 Pitcairn Islands 175 0 35 NA
## 1295 PCN 2000 Pitcairn Islands 175 0 40 NA
## 1296 PCN 2005 Pitcairn Islands 175 0 45 NA
## 1393 REU 1960 Reunion 188 2930 0 NA
## 1394 REU 1970 Reunion 188 5000 10 NA
## 1395 REU 1980 Reunion 188 5000 20 NA
## 1396 REU 1985 Reunion 188 8000 25 NA
## 1397 REU 1990 Reunion 188 11000 30 NA
## 1398 REU 1995 Reunion 188 12000 35 NA
## 1399 REU 2000 Reunion 188 12000 40 NA
## 1400 REU 2005 Reunion 188 9722 45 NA
## 1433 SDN 1960 Sudan 193 1387500 0 NA
## 1434 SDN 1970 Sudan 193 1625000 10 NA
## 1435 SDN 1980 Sudan 193 1700000 20 NA
## 1436 SDN 1985 Sudan 193 1763000 25 NA
## 1437 SDN 1990 Sudan 193 1800000 30 NA
## 1438 SDN 1995 Sudan 193 1946000 35 NA
## 1439 SDN 2000 Sudan 193 1863000 40 NA
## 1440 SDN 2005 Sudan 193 1863000 45 NA
## 1457 SHN 1960 Saint Helena 197 0 0 NA
## 1458 SHN 1970 Saint Helena 197 0 10 NA
## 1459 SHN 1980 Saint Helena 197 0 20 NA
## 1460 SHN 1985 Saint Helena 197 0 25 NA
## 1461 SHN 1990 Saint Helena 197 0 30 NA
## 1462 SHN 1995 Saint Helena 197 0 35 NA
## 1463 SHN 2000 Saint Helena 197 0 40 NA
## 1464 SHN 2005 Saint Helena 197 0 45 NA
## 1465 SJM 1960 Svalbard and Jan Mayen 198 0 0 NA
## 1466 SJM 1970 Svalbard and Jan Mayen 198 0 10 NA
## 1467 SJM 1980 Svalbard and Jan Mayen 198 0 20 NA
## 1468 SJM 1985 Svalbard and Jan Mayen 198 0 25 NA
## 1469 SJM 1990 Svalbard and Jan Mayen 198 0 30 NA
## 1470 SJM 1995 Svalbard and Jan Mayen 198 0 35 NA
## 1471 SJM 2000 Svalbard and Jan Mayen 198 0 40 NA
## 1472 SJM 2005 Svalbard and Jan Mayen 198 0 45 NA
## 1497 SMK 1960 Serbia, Montenegro, Kosovo 400 142175 0 NA
## 1498 SMK 1970 Serbia, Montenegro, Kosovo 400 157450 10 NA
## 1499 SMK 1980 Serbia, Montenegro, Kosovo 400 176250 20 NA
## 1500 SMK 1985 Serbia, Montenegro, Kosovo 400 173900 25 NA
## 1501 SMK 1990 Serbia, Montenegro, Kosovo 400 183364 30 NA
## 1502 SMK 1995 Serbia, Montenegro, Kosovo 400 125960 35 NA
## 1503 SMK 2000 Serbia, Montenegro, Kosovo 400 66663 40 NA
## 1504 SMK 2005 Serbia, Montenegro, Kosovo 400 165426 45 NA
## 1521 SPM 1960 St. Pierre and Miquelon 205 0 0 NA
## 1522 SPM 1970 St. Pierre and Miquelon 205 0 10 NA
## 1523 SPM 1980 St. Pierre and Miquelon 205 0 20 NA
## 1524 SPM 1985 St. Pierre and Miquelon 205 0 25 NA
## 1525 SPM 1990 St. Pierre and Miquelon 205 0 30 NA
## 1526 SPM 1995 St. Pierre and Miquelon 205 0 35 NA
## 1527 SPM 2000 St. Pierre and Miquelon 205 0 40 NA
## 1528 SPM 2005 St. Pierre and Miquelon 205 0 45 NA
## 1633 TKL 1960 Tokelau 220 0 0 NA
## 1634 TKL 1970 Tokelau 220 0 10 NA
## 1635 TKL 1980 Tokelau 220 0 20 NA
## 1636 TKL 1985 Tokelau 220 0 25 NA
## 1637 TKL 1990 Tokelau 220 0 30 NA
## 1638 TKL 1995 Tokelau 220 0 35 NA
## 1639 TKL 2000 Tokelau 220 0 40 NA
## 1640 TKL 2005 Tokelau 220 0 45 NA
## 1697 TWN 1960 Taiwan 228 488697 0 NA
## 1698 TWN 1970 Taiwan 228 521056 10 NA
## 1699 TWN 1980 Taiwan 228 546646 20 NA
## 1700 TWN 1985 Taiwan 228 550372 25 NA
## 1701 TWN 1990 Taiwan 228 510500 30 NA
## 1702 TWN 1995 Taiwan 228 525528 35 NA
## 1703 TWN 2000 Taiwan 228 508990 40 NA
## 1704 TWN 2005 Taiwan 228 492452 45 NA
## 1801 WLF 1960 Wallis and Futuna 243 0 0 NA
## 1802 WLF 1970 Wallis and Futuna 243 0 10 NA
## 1803 WLF 1980 Wallis and Futuna 243 0 20 NA
## 1804 WLF 1985 Wallis and Futuna 243 0 25 NA
## 1805 WLF 1990 Wallis and Futuna 243 0 30 NA
## 1806 WLF 1995 Wallis and Futuna 243 0 35 NA
## 1807 WLF 2000 Wallis and Futuna 243 0 40 NA
## 1808 WLF 2005 Wallis and Futuna 243 0 45 NA
#fixing counties with no area
aei$area_km[which(aei$ISO == "GUF")] <- 83534 #French Guiana from Britannica
aei$area_km[which(aei$ISO == "GLP")] <- 1630 #Guadeloupe from Britannica
aei$area_km[which(aei$ISO == "MTQ")] <- 1128 # Martinique from Britannica
aei$area_km[which(aei$ISO == "REU")] <- 2512 #Reunion from Britannica
aei$area_km[which(aei$ISO == "SMK")] <- 13812 + 10905 + 77589 #Montenegro + Kosovo + Serbia from Britannica
aei$area_km[which(aei$ISO == "TWN")] <- 35980 #Taiwan from CIA.gov
aei$area_km[which(aei$ISO == "SDN")] <- 1882000 #Sudan from UNDP
Now, transform the AEI_ha to \(km^2\) and divide to get the % of total country area that is equipped for irrigation. There are still some countries that have 0 AEI_ha and no area_km value. I think I will assign the irrfrac/perc 0 to those countries, but lets wait a little while to do that, most are islands.
aei <- aei %>% mutate(irrperc = ((aei_ha/100)/area_km)*100, irrfrac = ((aei_ha/100)/area_km))
Here is a quick histogram of the target variable. A lot of 0s.
hist(aei$irrperc, breaks = 200)
According to the FAO (Walker 1989), the choice of which irrigation, and by proxy whether to irrigate or not, comes from six factors. The predictor that will be used for this analysis are listed below:
Economic
Topographical
Soil Type
Water Supply
Crop Type
Social Influences
All variables will be scaled or centered.
First, the ISO codes and Regions to be able to hold some structure.
#the ISO codes and regional data from Gapminder
iso <- read.csv2("/Volumes/RachelExternal/Thesis/Thesis/Data/GAPminder_ISOcodes.csv")
iso <- iso[, -c(2,7,10,12,13)] #removing extra info
#renaming some cols
iso <-
iso %>%
rename(ISO = GEO, country = name)
#fixing the variable types
iso$ISO <- as.factor(iso$ISO)
iso$country <- as.factor(iso$country)
iso$four_regions <- as.factor(iso$four_regions)
iso$eight_regions <- as.factor(iso$eight_regions)
iso$six_regions <- as.factor(iso$six_regions)
iso$six_regions <- as.factor(iso$six_regions)
iso$World.bank.region <- as.factor(iso$World.bank.region)
str(iso)
## 'data.frame': 197 obs. of 8 variables:
## $ ISO : Factor w/ 197 levels "AFG","AGO","ALB",..: 1 3 50 4 2 8 6 7 9 10 ...
## $ country : Factor w/ 197 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ four_regions : Factor w/ 4 levels "africa","americas",..: 3 4 1 4 1 2 2 4 3 4 ...
## $ eight_regions : Factor w/ 8 levels "africa_north",..: 5 7 1 8 2 3 4 7 6 8 ...
## $ six_regions : Factor w/ 6 levels "america","east_asia_pacific",..: 5 3 4 3 6 1 1 3 2 3 ...
## $ Latitude : num 33 41 28 42.5 -12.5 ...
## $ Longitude : num 66 20 3 1.52 18.5 ...
## $ World.bank.region: Factor w/ 8 levels "","East Asia & Pacific",..: 7 3 5 3 8 4 4 3 2 3 ...
#past me helping future me
#fixing Macedonia, FYR to North Macedonia
iso$country <- as.character(iso$country)
iso[iso == "Macedonia, FYR"] <- "North Macedonia"
iso$country <- as.factor(iso$country)
Cool, we have multiple regions here that we could use in the future for some multilevel business. Also, latitude and longitude is here, which we can also try later. Obviously these lat and lon measurements will mean nothing for a country like China or the USA, but they are nice to have.
Income data is coming from Gapminder, as they have put real effort in to filling all the gaps in income and population for the last 150 years. Income is in terms of the 2011 USD and is inflation adjusted.
#income here is 2011 usd
income <- read.csv("/Volumes/RachelExternal/Thesis/Thesis/Data/Gapminder_income_per_person_gdppercapita_ppp_inflation_adjusted.csv")
income <- income[, c(1, 162:207)] #removing extra years
#making long data
income <-
income %>%
pivot_longer(!country, names_to = "year", values_to = "income")
#still have the xs there
income$year <- str_remove_all(income$year, "[X]")
#transformation of the variable types
income$year <- as.numeric(income$year)
income$country <- as.factor(income$country)
#log and standardize
income$log_income <- log(income$income)
income <- income %>%
mutate(log_income_std = log_income / mean(log_income))
#a summary
summary(income)
## country year income log_income
## Afghanistan : 46 Min. :1960 Min. : 312 Min. : 5.743
## Albania : 46 1st Qu.:1971 1st Qu.: 2250 1st Qu.: 7.719
## Algeria : 46 Median :1982 Median : 5290 Median : 8.574
## Andorra : 46 Mean :1982 Mean : 11344 Mean : 8.635
## Angola : 46 3rd Qu.:1994 3rd Qu.: 13575 3rd Qu.: 9.516
## Antigua and Barbuda: 46 Max. :2005 Max. :179000 Max. :12.095
## (Other) :8602
## log_income_std
## Min. :0.6651
## 1st Qu.:0.8939
## Median :0.9929
## Mean :1.0000
## 3rd Qu.:1.1020
## Max. :1.4007
##
#past me helping future me again
#fixing Eswatini to Swaziland (even though Eswatini is the correct name)
income$country <- as.character(income$country)
income[income == "Eswatini"] <- "Swaziland"
income$country <- as.factor(income$country)
Logging and standardizing is necessary here. Again, the argument is that wealth begets wealth, so income is logarithmic. From there just center the mean at 0 with a stdev of 1 for standardization.
Population data is also taken from Gapminder. The same process of logging and standardizing will be carried out here.
#population data also taken from Gapminder
population <- read.csv("/Volumes/RachelExternal/Thesis/Thesis/Data/Gapminder_population_total.csv")
#removing extra years
population <- population[, c(1, 162:207)]
#making longer data
population <-
population %>%
pivot_longer(!country, names_to = "year", values_to = "population")
#removing the Xs from the year strings
population$year <- str_remove_all(population$year, "[X]")
#fixing variable types
population$country <- as.factor(population$country)
population$year <- as.numeric(population$year)
#log and standardize, same argument as income: population begets population
population$log_pop <- log(population$population)
population <- population %>%
mutate(log_pop_std = log_pop / mean(log_pop))
summary(population)
## country year population log_pop
## Afghanistan : 46 Min. :1960 Min. :6.450e+02 Min. : 6.469
## Albania : 46 1st Qu.:1971 1st Qu.:9.045e+05 1st Qu.:13.715
## Algeria : 46 Median :1982 Median :4.545e+06 Median :15.330
## Andorra : 46 Mean :1982 Mean :2.404e+07 Mean :14.988
## Angola : 46 3rd Qu.:1994 3rd Qu.:1.320e+07 3rd Qu.:16.396
## Antigua and Barbuda: 46 Max. :2005 Max. :1.330e+09 Max. :21.008
## (Other) :8694
## log_pop_std
## Min. :0.4316
## 1st Qu.:0.9151
## Median :1.0228
## Mean :1.0000
## 3rd Qu.:1.0939
## Max. :1.4017
##
#annnd again...
#fixing Eswatini to Swaziland (even though Eswatini is the correct name)
population$country <- as.character(population$country)
population[population == "Eswatini"] <- "Swaziland"
population$country <- as.factor(population$country)
So some issues arise here with the fact that we have different country data for each dataframe. Sometimes the names are capitalized, other times there are spaces instead of underscores. By using anti_join() I can see what won’t be joined. Also this goes both ways, anti_join() returns rows of x that do not have a match in y.
isopop <- anti_join(iso, population, by = "country")
summary(isopop)
## ISO country four_regions eight_regions
## HKG :1 Hong Kong, China:1 africa :0 east_asia_pacific :2
## TWN :1 Taiwan :1 americas:0 africa_north :0
## AFG :0 Afghanistan :0 asia :2 africa_sub_saharan:0
## AGO :0 Albania :0 europe :0 america_north :0
## ALB :0 Algeria :0 america_south :0
## AND :0 Andorra :0 asia_west :0
## (Other):0 (Other) :0 (Other) :0
## six_regions Latitude Longitude
## america :0 Min. :22.29 Min. :114.2
## east_asia_pacific :2 1st Qu.:22.71 1st Qu.:115.9
## europe_central_asia :0 Median :23.14 Median :117.6
## middle_east_north_africa:0 Mean :23.14 Mean :117.6
## south_asia :0 3rd Qu.:23.57 3rd Qu.:119.3
## sub_saharan_africa :0 Max. :24.00 Max. :121.0
##
## World.bank.region
## East Asia & Pacific :2
## :0
## Europe & Central Asia :0
## Latin America & Caribbean :0
## Middle East & North Africa:0
## North America :0
## (Other) :0
popiso <- anti_join(population, iso, by = "country")
summary(popiso)
## country year population log_pop
## Afghanistan :0 Min. : NA Min. : NA Min. : NA
## Albania :0 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
## Algeria :0 Median : NA Median : NA Median : NA
## Andorra :0 Mean :NaN Mean :NaN Mean :NaN
## Angola :0 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
## Antigua and Barbuda:0 Max. : NA Max. : NA Max. : NA
## (Other) :0
## log_pop_std
## Min. : NA
## 1st Qu.: NA
## Median : NA
## Mean :NaN
## 3rd Qu.: NA
## Max. : NA
##
Went back up and solved the issue of North Macedonia and Swaziland. There are still issues with Hong Kong and Taiwan as they are part of China. ISO has data for Hong Kong and Taiwan that population does not have. I wont include them in the HTML but they are in the code.
Alright, so I will left join ISO and population then again with income? Not sure that this will solve my problem, but lets try it.
#remove all these rando tables
rm(popinc, inciso, incpop, popiso, isoinc, isopop)
#
iso <- left_join(iso, population, by = "country")
iso <- left_join(iso, income, by = c("country", "year"))
# alright 197 countries! whoop!
str(iso)
## 'data.frame': 8972 obs. of 15 variables:
## $ ISO : Factor w/ 197 levels "AFG","AGO","ALB",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ country : Factor w/ 197 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ four_regions : Factor w/ 4 levels "africa","americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ eight_regions : Factor w/ 8 levels "africa_north",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ six_regions : Factor w/ 6 levels "america","east_asia_pacific",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ Latitude : num 33 33 33 33 33 33 33 33 33 33 ...
## $ Longitude : num 66 66 66 66 66 66 66 66 66 66 ...
## $ World.bank.region: Factor w/ 8 levels "","East Asia & Pacific",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ year : num 1960 1961 1962 1963 1964 ...
## $ population : int 9000000 9170000 9350000 9540000 9740000 9960000 10200000 10400000 10600000 10900000 ...
## $ log_pop : num 16 16 16.1 16.1 16.1 ...
## $ log_pop_std : num 1.07 1.07 1.07 1.07 1.07 ...
## $ income : int 2740 2700 2680 2670 2650 2640 2600 2600 2620 2590 ...
## $ log_income : num 7.92 7.9 7.89 7.89 7.88 ...
## $ log_income_std : num 0.917 0.915 0.914 0.914 0.913 ...
rm(income, population)
Now, join ISO to AEI with a merge (cause in this case the dynamics of merge are clearer for me). Hoping to preserve all data!
aei <- merge(aei, iso, by = c("ISO","country", "year"), all.x = TRUE)
rm(iso)
str(aei)
## 'data.frame': 1848 obs. of 21 variables:
## $ ISO : Factor w/ 232 levels "","ABW","AFG",..: 2 2 2 2 2 2 2 2 3 3 ...
## $ country : Factor w/ 232 levels "Afghanistan",..: 12 12 12 12 12 12 12 12 1 1 ...
## $ year : num 1960 1970 1980 1985 1990 ...
## $ ID : int 1 1 1 1 1 1 1 1 2 2 ...
## $ aei_ha : num 0 0 0 0 0 ...
## $ yearcount : num 0 10 20 25 30 35 40 45 0 10 ...
## $ area_km : num 180 180 180 180 180 ...
## $ irrperc : num 0 0 0 0 0 ...
## $ irrfrac : num 0 0 0 0 0 ...
## $ four_regions : Factor w/ 4 levels "africa","americas",..: NA NA NA NA NA NA NA NA 3 3 ...
## $ eight_regions : Factor w/ 8 levels "africa_north",..: NA NA NA NA NA NA NA NA 5 5 ...
## $ six_regions : Factor w/ 6 levels "america","east_asia_pacific",..: NA NA NA NA NA NA NA NA 5 5 ...
## $ Latitude : num NA NA NA NA NA NA NA NA 33 33 ...
## $ Longitude : num NA NA NA NA NA NA NA NA 66 66 ...
## $ World.bank.region: Factor w/ 8 levels "","East Asia & Pacific",..: NA NA NA NA NA NA NA NA 7 7 ...
## $ population : int NA NA NA NA NA NA NA NA 9000000 11200000 ...
## $ log_pop : num NA NA NA NA NA ...
## $ log_pop_std : num NA NA NA NA NA ...
## $ income : int NA NA NA NA NA NA NA NA 2740 2570 ...
## $ log_income : num NA NA NA NA NA ...
## $ log_income_std : num NA NA NA NA NA ...
Solved the issue, 232 factors for both countries and ISO, meaning that we have been using the original AEI file from Siebert as the main data frame to join to. Good. This is the data that needs to be maintained, NAs can pop up in other cols but not irrigated area.
But I still would like total GDP for later so lets transform using population.
aei$income <- as.numeric(aei$income)
#mutating for total GDP and log GDP
aei <-
aei %>%
mutate(GDPtot = income*population) %>%
mutate(log_gdp_tot = log(GDPtot))
#creating the df to standardize without the NAs
aei2 <-
aei %>%
select(ISO, year, log_gdp_tot) %>%
na.omit() %>%
mutate(log_gdp_tot_std = log_gdp_tot/mean(log_gdp_tot)) %>%
select(ISO, year, log_gdp_tot_std)
#now rejoin!
aei <-
left_join(aei, aei2, by = c("ISO", "year"))
str(aei)
## 'data.frame': 1848 obs. of 24 variables:
## $ ISO : Factor w/ 232 levels "","ABW","AFG",..: 2 2 2 2 2 2 2 2 3 3 ...
## $ country : Factor w/ 232 levels "Afghanistan",..: 12 12 12 12 12 12 12 12 1 1 ...
## $ year : num 1960 1970 1980 1985 1990 ...
## $ ID : int 1 1 1 1 1 1 1 1 2 2 ...
## $ aei_ha : num 0 0 0 0 0 ...
## $ yearcount : num 0 10 20 25 30 35 40 45 0 10 ...
## $ area_km : num 180 180 180 180 180 ...
## $ irrperc : num 0 0 0 0 0 ...
## $ irrfrac : num 0 0 0 0 0 ...
## $ four_regions : Factor w/ 4 levels "africa","americas",..: NA NA NA NA NA NA NA NA 3 3 ...
## $ eight_regions : Factor w/ 8 levels "africa_north",..: NA NA NA NA NA NA NA NA 5 5 ...
## $ six_regions : Factor w/ 6 levels "america","east_asia_pacific",..: NA NA NA NA NA NA NA NA 5 5 ...
## $ Latitude : num NA NA NA NA NA NA NA NA 33 33 ...
## $ Longitude : num NA NA NA NA NA NA NA NA 66 66 ...
## $ World.bank.region: Factor w/ 8 levels "","East Asia & Pacific",..: NA NA NA NA NA NA NA NA 7 7 ...
## $ population : int NA NA NA NA NA NA NA NA 9000000 11200000 ...
## $ log_pop : num NA NA NA NA NA ...
## $ log_pop_std : num NA NA NA NA NA ...
## $ income : num NA NA NA NA NA NA NA NA 2740 2570 ...
## $ log_income : num NA NA NA NA NA ...
## $ log_income_std : num NA NA NA NA NA ...
## $ GDPtot : num NA NA NA NA NA ...
## $ log_gdp_tot : num NA NA NA NA NA ...
## $ log_gdp_tot_std : num NA NA NA NA NA ...
Finally, lets check out a quick histogram of GDP and pop,
hist(aei$log_gdp_tot_std, breaks = 100)
hist(aei$log_pop_std, breaks = 100)
There are a lot of countries that have no data for the regions, but have non zero irrperc/frac which I will use in further analysis, so we need to replace the regions.
#view which countries don't have regions but have non zero irrprec
na_regions <- aei[is.na(aei$four_regions),]
#Aruba
aei <- aei %>%
mutate(four_regions = replace(four_regions, ISO == 'ABW', 'americas'),
six_regions = replace(six_regions, ISO == 'ABW', 'america'),
eight_regions = replace(eight_regions, ISO == 'ABW', 'america_north')) %>%
#american samoa
mutate(four_regions = replace(four_regions,ISO == 'ASM', 'asia'),
six_regions = replace(six_regions,ISO == 'ASM', 'east_asia_pacific'),
eight_regions = replace(eight_regions,ISO == 'ASM', 'east_asia_pacific')) %>%
#bermuda
mutate(four_regions = replace(four_regions,ISO == 'BMU', 'americas'),
six_regions = replace(six_regions,ISO == 'BMU', 'america'),
eight_regions = replace(eight_regions,ISO == 'BMU', 'america_north')) %>%
#cote d'ivoire
mutate(four_regions = replace(four_regions,ISO == 'CIV', 'africa'),
six_regions = replace(six_regions,ISO == 'CIV', 'sub_saharan_africa'),
eight_regions = replace(eight_regions,ISO == 'CIV', 'africa_sub_saharan')) %>%
#democratic republic of congo
mutate(four_regions = replace(four_regions,ISO == 'COD', 'africa'),
six_regions = replace(six_regions,ISO == 'COD', 'sub_saharan_africa'),
eight_regions = replace(eight_regions,ISO == 'COD', 'africa_sub_saharan')) %>%
#republic of congo
mutate(four_regions = replace(four_regions,ISO == 'COG', 'africa'),
six_regions = replace(six_regions,ISO == 'COG', 'sub_saharan_africa'),
eight_regions = replace(eight_regions,ISO == 'COG', 'africa_sub_saharan')) %>%
#cayman islands
mutate(four_regions = replace(four_regions,ISO == 'CYM', 'americas'),
six_regions = replace(six_regions,ISO == 'CYM', 'america'),
eight_regions = replace(eight_regions,ISO == 'CYM', 'america_north')) %>%
#Faroe islands
mutate(four_regions = replace(four_regions,ISO == 'FRO', 'europe'),
six_regions = replace(six_regions,ISO == 'FRO', 'europe_central_asia'),
eight_regions = replace(eight_regions,ISO == 'FRO', 'europe_west')) %>%
#micronesia
mutate(four_regions = replace(four_regions,ISO == 'FSM', 'asia'),
six_regions = replace(six_regions,ISO == 'FSM', 'east_asia_pacific'),
eight_regions = replace(eight_regions,ISO == 'FSM', 'east_asia_pacific')) %>%
#gibraltar
mutate(four_regions = replace(four_regions,ISO == 'GIB', 'europe'),
six_regions = replace(six_regions,ISO == 'GIB', 'europe_central_asia'),
eight_regions = replace(eight_regions,ISO == 'GIB', 'europe_west')) %>%
#Guadeloupe
mutate(four_regions = replace(four_regions,ISO == 'GLP', 'americas'),
six_regions = replace(six_regions,ISO == 'GLP', 'america'),
eight_regions = replace(eight_regions,ISO == 'GLP', 'america_north')) %>%
#greenland
mutate(four_regions = replace(four_regions,ISO == 'GRL', 'europe'),
six_regions = replace(six_regions,ISO == 'GRL', 'europe_central_asia'),
eight_regions = replace(eight_regions,ISO == 'GRL', 'europe_west')) %>%
#french Guiana
mutate(four_regions = replace(four_regions,ISO == 'GUF', 'americas'),
six_regions = replace(six_regions,ISO == 'GUF', 'america'),
eight_regions = replace(eight_regions,ISO == 'GUF', 'america_south')) %>%
#guam
mutate(four_regions = replace(four_regions,ISO == 'GUM', 'asia'),
six_regions = replace(six_regions,ISO == 'GUM', 'east_asia_pacific'),
eight_regions = replace(eight_regions,ISO == 'GUM', 'east_asia_pacific')) %>%
#Kyrgyzstan
mutate(four_regions = replace(four_regions,ISO == 'KGZ', 'asia'),
six_regions = replace(six_regions,ISO == 'KGZ', 'europe_central_asia'),
eight_regions = replace(eight_regions,ISO == 'KGZ', 'asia_west')) %>%
#laos
mutate(four_regions = replace(four_regions,ISO == 'LAO', 'asia'),
six_regions = replace(six_regions,ISO == 'LAO', 'east_asia_pacific'),
eight_regions = replace(eight_regions,ISO == 'LAO', 'east_asia_pacific')) %>%
#Macedonia
mutate(four_regions = replace(four_regions,ISO == 'MKD', 'europe'),
six_regions = replace(six_regions,ISO == 'MKD', 'europe_central_asia'),
eight_regions = replace(eight_regions,ISO == 'MKD', 'europe_east')) %>%
#Martinique
mutate(four_regions = replace(four_regions,ISO == 'MTQ', 'americas'),
six_regions = replace(six_regions,ISO == 'MTQ', 'america'),
eight_regions = replace(eight_regions,ISO == 'MTQ', 'america_north')) %>%
#new caledonia
mutate(four_regions = replace(four_regions,ISO == 'NCL', 'asia'),
six_regions = replace(six_regions,ISO == 'NCL', 'east_asia_pacific'),
eight_regions = replace(eight_regions,ISO == 'NCL', 'east_asia_pacific')) %>%
#puerto rico
mutate(four_regions = replace(four_regions,ISO == 'PRI', 'americas'),
six_regions = replace(six_regions,ISO == 'PRI', 'america'),
eight_regions = replace(eight_regions,ISO == 'PRI', 'america_north')) %>%
#Palestine (West Bank + Gaza strip)
mutate(four_regions = replace(four_regions,ISO == 'PSE', 'asia'),
six_regions = replace(six_regions,ISO == 'PSE', 'middle_east_north_africa'),
eight_regions = replace(eight_regions,ISO == 'PSE', 'asia_west')) %>%
#french Polynesia
mutate(four_regions = replace(four_regions,ISO == 'PYF', 'asia'),
six_regions = replace(six_regions,ISO == 'PYF', 'east_asia_pacific'),
eight_regions = replace(eight_regions,ISO == 'PYF', 'east_asia_pacific')) %>%
#Reunion
mutate(four_regions = replace(four_regions,ISO == 'REU', 'africa'),
six_regions = replace(six_regions,ISO == 'REU', 'sub_saharan_africa'),
eight_regions = replace(eight_regions,ISO == 'REU', 'africa_sub_saharan')) %>%
#Serbia, Montenegro, Kosovo
mutate(four_regions = replace(four_regions,ISO == 'SMK', 'europe'),
six_regions = replace(six_regions,ISO == 'SMK', 'europe_central_asia'),
eight_regions = replace(eight_regions,ISO == 'SMK', 'europe_east')) %>%
#Slovakia
mutate(four_regions = replace(four_regions,ISO == 'SVK', 'europe'),
six_regions = replace(six_regions,ISO == 'SVK', 'europe_central_asia'),
eight_regions = replace(eight_regions,ISO == 'SVK', 'europe_east')) %>%
#Turks and Caicos Islands
mutate(four_regions = replace(four_regions,ISO == 'TCA', 'americas'),
six_regions = replace(six_regions,ISO == 'TCA', 'america'),
eight_regions = replace(eight_regions,ISO == 'TCA', 'america_north')) %>%
#East Timor
mutate(four_regions = replace(four_regions,ISO == 'TLS', 'asia'),
six_regions = replace(six_regions,ISO == 'TLS', 'east_asia_pacific'),
eight_regions = replace(eight_regions,ISO == 'TLS', 'east_asia_pacific')) %>%
#taiwan
mutate(four_regions = replace(four_regions,ISO == 'TWN', 'asia'),
six_regions = replace(six_regions,ISO == 'TWN', 'east_asia_pacific'),
eight_regions = replace(eight_regions,ISO == 'TWN', 'east_asia_pacific')) %>%
#British virgin Islands
mutate(four_regions = replace(four_regions,ISO == 'VGB', 'americas'),
six_regions = replace(six_regions,ISO == 'VGB', 'america'),
eight_regions = replace(eight_regions,ISO == 'VGB', 'america_north')) %>%
#US Virgin Islands
mutate(four_regions = replace(four_regions,ISO == 'VIR', 'americas'),
six_regions = replace(six_regions,ISO == 'VIR', 'america'),
eight_regions = replace(eight_regions, ISO == 'VIR', 'america_north'))
Next time I’ll write a function.
There are also a lot of countries that have NA values for their income or population. Take a peek here.
lotanas <- aei[rowSums(is.na(aei)) > 0,]
lotanas %>%
group_by(country) %>%
select(income, population) %>%
summarise_all(funs(sum(is.na(.))))
## # A tibble: 51 x 3
## country income population
## <fct> <int> <int>
## 1 American Samoa 8 8
## 2 Anguilla 8 8
## 3 Antarctica 8 8
## 4 Aruba 8 8
## 5 Bermuda 8 8
## 6 Bouvet Island 8 8
## 7 British Indian Ocean Territory 8 8
## 8 British Virgin Islands 8 8
## 9 Cayman Islands 8 8
## 10 Christmas Island 8 8
## # … with 41 more rows
The countries that don’t have income also don’t have population, so as least I am consistent in my missing data. I have to move on so I will just leave this here for a minute. Perhaps later we should fill these variables.
Again, precipitation is translated output from LPJmL. Precipitation was chosen because it could be summed across countries, as other data could not (like discharge or ET). I have summed the precipitation by country and year. I am not sure whether I should scale or standardize precip, I will do all the possible combinations at the moment.
#precipitation
#cftandprecip is a script with all the grid cell values.
cftandprecip <- read.csv('/Volumes/RachelExternal/Thesis/Thesis/Data/cft_and_precipdata.csv')
cftandprecip$ISO <- as.factor(cftandprecip$ISO)
cftandprecip <- cftandprecip[, -c(1)] #remove id col
cftandprecip <- subset(cftandprecip, year >= 1960)
#log and scale
precip <-
cftandprecip %>%
select(ISO, year, precipmm) %>%
group_by(ISO, year) %>%
summarise(preciptot = (sum(precipmm))) %>% #sum grid cells by ISO and year
mutate(precip_sc = preciptot/max(preciptot)) %>%
mutate(precip_std = preciptot/mean(preciptot)) %>%
mutate(log_precip = log(preciptot+1)) %>% #1 is added here to prevent -Inf
mutate(log_precip_sc = log_precip/max(log_precip)) %>%
mutate(log_precip_std = log_precip/mean(log_precip))
#merge
aei <- merge(aei, precip, by = c("ISO", "year"), all.x = TRUE)
rm(precip)
And some histograms to check out the scaled/standardized data.
hist(aei$preciptot, breaks = 100)
hist(aei$precip_sc, breaks = 100)
hist(aei$precip_std, breaks = 100)
hist(aei$log_precip, breaks = 100)
hist(aei$log_precip_sc, breaks = 100)
hist(aei$log_precip_std, breaks = 100)
Uff. Then comes crop fraction. As mentioned above individual crop fraction per grid cell is multiplied by respective grid cell area, summed on a country level giving ha of each crop grown per country. This was then divided by total area of the country to achieve the % area occupied by a certain crop per country.
#pull cft frac outta here
cft <-
cftandprecip %>%
select(-c(4)) %>% #removing precip
mutate_each(funs(.*cellaream2), c(4:35)) %>% #multiply all fractions with the cell area
group_by(ISO, year) %>% #group in to country and year
summarise_each(funs(sum)) %>% #take col sums for each crop
mutate(across(c(3:34), .fns = ~./cellaream2)) #divide col sums by total country area
#removing categories with 0 (the biofuels and such)
cft <- cft[, -c(18, 19, 26, 34, 35)]
#and a merge
aei <- merge(aei, cft, by = c("ISO", "year"), all.x = TRUE)
rm(cft, cftandprecip)
Idk how to show the data here. Any ideas on representation would be helpful here.
Topographical data is represented here by the data set from Riley, Degloria, and Elliot (1999) and is conviently available through the library(rethinking) from McElreath (2015). It provides a ruggedness factor for each country.
#rugged
data("rugged")
rug <- rugged %>% select(c("isocode", "rugged"))
colnames(rug) <- c("ISO", "rugged")
#scaling
rug$rugged_sc <- rug$rugged / max(rug$rugged)
#merge
aei <- merge(aei, rug, by = "ISO", all.x = TRUE)
rm(rug, rugged)
hist(aei$rugged_sc, breaks = 100)
Ok, so lets peek at how things change over time. Here is a graph of percentage of country area equipped for irrigation from 1960 - 2005.
pirrfrac <-
ggplot(data = subset(aei, !is.na(irrperc)),
mapping = aes(x=year, y= irrperc, group = country, color = six_regions)) +
geom_line() +
scale_x_continuous() +
theme(panel.grid = element_blank())
#run this separately, otherwise the plotly part doesn't work.
ggplotly(pirrfrac)
Although this graph has a lot of countries and can get quite messy, some general trends appear. South Asian countries seem to have high percentages of irrigated area per total land area (irrigation percentage). Then seemingly followed by East Asian and European countries. Africa and the Middle East do not seem to be irrigated as much. Be aware that you can these graphs are interactive, you may zoom, pan or click on the legend to isolate the regions. Although, a regional breakdown follows.
In general, Africa has low levels of irrigated area. Swaziland, Sao Tome and Principe, Reunion and Mauritius lead here with majority of others falling below the levels of these countries.
Europe contains some countries that reach higher percentages of irrigated area. Countries such as the Netherlands, Romania, Albania, and Azerbaijan have reached peaks in the past (roughly 10-15% irrigated area) but seem to be in a decline towards the end our study period. Other countries such as Italy and Denmark have had more stable rises with less down turn toward. Unexpectedly, Russia has a very low percentage of irrigated area, however at this point, the assumption is that either Russia’s enormous size cancels out its irrigated area, or that Russia receives enough precipitation for the crops it grows that irrigation is not necessary. Further analysis should reveal this.
In the Americas, the % of area irrigated by countries is lower than Asia or Europe. Peaks here are Cuba and the Dominican Republic at around roughly 6-8% of their total countries being irrigated. There are some islands (Barbados and St. Lucia) that have sharp increases in irrigation fraction, but this could be because they are islands and any irrigation at all would induce the ration of irrigated land to total land spike. Counties that have experienced a steady increase, but are not the leaders in overall irrigation fraction are countries that are big and produce a lot of produce (USA, Mexico, Chile) The general trends of irrigation increase seem to be upward with perhaps a slight leveling off toward the end of the study period (for some countries) but lacking the distinct downturn that is seen in Europe.
The East Asian countries are quite high in % of irrigated area, when compared to the Americas or Africa. The four countries that reach upwards of (roughly) 10% or more of their total area irrigated toward the end of the study period are Vietnam, North Korea, South Korea, and Thailand. Other countries that have high irrfracs are China, Japan and the Philippines. It is expected that for this region the irrigation fraction would be high due to the amount of rice produced.
Oceania, due to its large amount of islands, does not have a lot of irrigation, majority of countries have 0% irrigation. New Zealand (around 2%) and Australia (less than 1%) have some irrigation. Although this could have a multitude of reasons why irrigation is so low here, regardless of the fact that Australia and New Zealand are very populous countries with mouths to feed. Australia is very large and perhaps its enormity cancels out the irrigated land it does have.
Very few countries here, countries in this area are large. Very high levels of irrigation here coming from most countries. Bangladesh, Pakistan, and India are heavily irrigated compared to their land area.
Irrigation is also prevalent here (but not as much as in south Asia). These countries are dry and therefore would need irrigation to have domestic food supplies. Lebanon, Iraq and Israel lead here. Oil producing countries have little irrigation, they are dry and their GDP is high, perhaps importation is the easiest option for them?
Overall, this might be a good overview of what irrperc looks like for each region.
theme_set(theme_minimal())
ggplot(subset(aei, !is.na(irrperc)),
aes(x=six_regions, y=irrperc, fill=six_regions)) +
geom_boxplot(show.legend = FALSE) +
coord_flip() +
theme(legend.title = element_blank())